Search CORE

28 research outputs found

Workflow models for heterogeneous distributed systems

Author: Colonnelli Iacopo
Publication venue
Publication date: 01/01/2022
Field of study

The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institutional Research Information System University of Turin

Hybrid Workflows for Large - Scale Scientific Applications

Author: Aldinucci Marco
Colonnelli Iacopo
Publication venue: European Association of Geoscientists and Engineers
Publication date: 01/01/2022
Field of study

Institutional Research Information System University of Turin

Practical Parallelization of Scientific Applications

Author: Aldinucci Marco
Cesare Valentina
Colonnelli Iacopo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Crossref

Institutional Research Information System University of Turin

Experimenting with PyTorch on RISC-V

Author: Aldinucci Marco
Birke Robert
Colonnelli Iacopo
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

The DeepHealth Toolkit: A Key European Free and Open-Source Software for Deep Learning and Computer Vision Ready to Exploit Heterogeneous HPC and Cloud Architectures

Author: Aldinucci Marco
Colonnelli Iacopo
Grangetto Marco
Tartaglione Enzo
Publication venue: Springer
Publication date: 01/01/2022
Field of study

Institutional Research Information System University of Turin

RISC-V-based Platforms for HPC: Analyzing Non-functional Properties for Future HPC and Big-Data Clusters

Author: Aldinucci Marco
Birke Robert
Colonnelli Iacopo
Mittone Gianluca
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

StreamFlow: cross-breeding cloud with HPC

Author: Aldinucci Marco
Cantalupo Barbara
Colonnelli Iacopo
Merelli Ivan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2020
Field of study

Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single-cell transcriptomic data analysis workflow.Comment: 30 pages - 2020 IEEE Transactions on Emerging Topics in Computin

arXiv.org e-Print Archive

Institutional Research Information System University of Turin

HPC Application Cloudification: The StreamFlow Toolkit

Author: Aldinucci Marco
Cantalupo Barbara
Colonnelli Iacopo
Esposito Roberto
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum fur Informatik
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Bringing AI pipelines onto cloud-HPC: setting a baseline for accuracy of COVID-19 diagnosis

Author: Aldinucci Marco
Cantalupo Barbara
Colonnelli Iacopo
Pennisi Matteo
Spampinato Concetto
Publication venue: ENEA
Publication date: 01/01/2021
Field of study

HPC is an enabling platform for AI. The introduction of AI workloads in the HPC applications basket has non-trivial consequences both on the way of designing AI applications and on the way of providing HPC computing. This is the leitmotif of the convergence between HPC and AI. The formalized definition of AI pipelines is one of the milestones of HPC-AI convergence. If well conducted, it allows, on the one hand, to obtain portable and scalable applications. On the other hand, it is crucial for the reproducibility of scientific pipelines. In this work, we advocate the StreamFlow Workflow Management System as a crucial ingredient to define a parametric pipeline, called “CLAIRE COVID-19 Universal Pipeline”, which is able to explore the optimization space of methods to classify COVID-19 lung lesions from CT scans, compare them for accuracy, and therefore set a performance baseline. The universal pipeline automatizes the training of many different Deep Neural Networks (DNNs) and many different hyperparameters. It, therefore, requires a massive computing power, which is found in traditional HPC infrastructure thanks to the portability-by-design of pipelines designed with StreamFlow. Using the universal pipeline, we identified a DNN reaching over 90% accuracy in detecting COVID-19 lesions in CT scans

ZENODO

Institutional Research Information System University of Turin

Model-Agnostic Federated Learning

Author: Aldinucci Marco
Birke Robert
Colonnelli Iacopo
Mittone Gianluca
Riviera Walter
Publication venue
Publication date: 18/10/2023
Field of study

Since its debut in 2016, Federated Learning (FL) has been tied to the inner workings of Deep Neural Networks (DNNs). On the one hand, this allowed its development and widespread use as DNNs proliferated. On the other hand, it neglected all those scenarios in which using DNNs is not possible or advantageous. The fact that most current FL frameworks only allow training DNNs reinforces this problem. To address the lack of FL solutions for non-DNN-based use cases, we propose MAFL (Model-Agnostic Federated Learning). MAFL marries a model-agnostic FL algorithm, AdaBoost.F, with an open industry-grade FL framework: Intel OpenFL. MAFL is the first FL system not tied to any specific type of machine learning model, allowing exploration of FL scenarios beyond DNNs and trees. We test MAFL from multiple points of view, assessing its correctness, flexibility and scaling properties up to 64 nodes. We optimised the base software achieving a 5.5x speedup on a standard FL scenario. MAFL is compatible with x86-64, ARM-v8, Power and RISC-V.Comment: Published at the EuroPar'23 conference, Limassol, Cypru

arXiv.org e-Print Archive